Overview

Dataset statistics

Number of variables12
Number of observations4898
Missing cells0
Missing cells (%)0.0%
Duplicate rows937
Duplicate rows (%)19.1%
Total size in memory459.3 KiB
Average record size in memory96.0 B

Variable types

NUM12

Reproduction

Analysis started2020-08-18 18:17:43.247354
Analysis finished2020-08-18 18:17:59.983353
Duration16.74 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 937 (19.1%) duplicate rows Duplicates

Variables

fixed_acidity
Real number (ℝ≥0)

Distinct count68
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.854787668436097
Minimum3.8
Maximum14.2
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum3.8
5-th percentile5.6
Q16.3
median6.8
Q37.3
95-th percentile8.3
Maximum14.2
Range10.4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8438682277
Coefficient of variation (CV)0.1231063993
Kurtosis2.172178465
Mean6.854787668
Median Absolute Deviation (MAD)0.5
Skewness0.6477514746
Sum33574.75
Variance0.7121135857
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6.83086.3%
 
6.62905.9%
 
6.42805.7%
 
6.92414.9%
 
6.72364.8%
 
72324.7%
 
6.52254.6%
 
7.22064.2%
 
7.12004.1%
 
7.41944.0%
 
6.21923.9%
 
6.31883.8%
 
61843.8%
 
7.31783.6%
 
6.11553.2%
 
7.61533.1%
 
7.51232.5%
 
5.81212.5%
 
5.91032.1%
 
7.8931.9%
 
7.7931.9%
 
5.7881.8%
 
8801.6%
 
7.9741.5%
 
5.6711.4%
 
Other values (43)59012.0%
 
ValueCountFrequency (%) 
3.81< 0.1%
 
3.91< 0.1%
 
4.22< 0.1%
 
4.430.1%
 
4.51< 0.1%
 
4.61< 0.1%
 
4.750.1%
 
4.890.2%
 
4.970.1%
 
5240.5%
 
ValueCountFrequency (%) 
14.21< 0.1%
 
11.81< 0.1%
 
10.72< 0.1%
 
10.32< 0.1%
 
10.21< 0.1%
 
1030.1%
 
9.92< 0.1%
 
9.880.2%
 
9.740.1%
 
9.650.1%
 

volatile_acidity
Real number (ℝ≥0)

Distinct count125
Unique (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.27824111882400976
Minimum0.08
Maximum1.1
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.08
5-th percentile0.15
Q10.21
median0.26
Q30.32
95-th percentile0.46
Maximum1.1
Range1.02
Interquartile range (IQR)0.11

Descriptive statistics

Standard deviation0.1007945484
Coefficient of variation (CV)0.3622561211
Kurtosis5.091625817
Mean0.2782411188
Median Absolute Deviation (MAD)0.06
Skewness1.576979503
Sum1362.825
Variance0.01015954099
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.282635.4%
 
0.242535.2%
 
0.262404.9%
 
0.252314.7%
 
0.222294.7%
 
0.272184.5%
 
0.232164.4%
 
0.22144.4%
 
0.31984.0%
 
0.211913.9%
 
0.321823.7%
 
0.181773.6%
 
0.191703.5%
 
0.291603.3%
 
0.311483.0%
 
0.161412.9%
 
0.171402.9%
 
0.341352.8%
 
0.331342.7%
 
0.361042.1%
 
0.15881.8%
 
0.35861.8%
 
0.37651.3%
 
0.38631.3%
 
0.39611.2%
 
Other values (100)79116.1%
 
ValueCountFrequency (%) 
0.0840.1%
 
0.0851< 0.1%
 
0.091< 0.1%
 
0.160.1%
 
0.10560.1%
 
0.11130.3%
 
0.11530.1%
 
0.12340.7%
 
0.12530.1%
 
0.13440.9%
 
ValueCountFrequency (%) 
1.11< 0.1%
 
1.0051< 0.1%
 
0.9651< 0.1%
 
0.931< 0.1%
 
0.911< 0.1%
 
0.9051< 0.1%
 
0.851< 0.1%
 
0.8151< 0.1%
 
0.7851< 0.1%
 
0.781< 0.1%
 

citric_acid
Real number (ℝ≥0)

Distinct count87
Unique (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.33419150673744386
Minimum0.0
Maximum1.66
Zeros19
Zeros (%)0.4%
Memory size38.4 KiB

Quantile statistics

Minimum0
5-th percentile0.17
Q10.27
median0.32
Q30.39
95-th percentile0.54
Maximum1.66
Range1.66
Interquartile range (IQR)0.12

Descriptive statistics

Standard deviation0.1210198042
Coefficient of variation (CV)0.362127109
Kurtosis6.174900657
Mean0.3341915067
Median Absolute Deviation (MAD)0.06
Skewness1.281920398
Sum1636.87
Variance0.01464579301
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.33076.3%
 
0.282825.8%
 
0.322575.2%
 
0.342254.6%
 
0.292234.6%
 
0.262194.5%
 
0.272164.4%
 
0.492154.4%
 
0.312004.1%
 
0.331833.7%
 
0.241813.7%
 
0.361773.6%
 
0.351372.8%
 
0.251362.8%
 
0.371342.7%
 
0.381222.5%
 
0.41172.4%
 
0.221042.1%
 
0.391012.1%
 
0.42951.9%
 
0.23831.7%
 
0.41821.7%
 
0.2701.4%
 
0.21661.3%
 
0.44631.3%
 
Other values (62)90318.4%
 
ValueCountFrequency (%) 
0190.4%
 
0.0170.1%
 
0.0260.1%
 
0.032< 0.1%
 
0.04120.2%
 
0.0550.1%
 
0.0660.1%
 
0.07120.2%
 
0.0840.1%
 
0.09120.2%
 
ValueCountFrequency (%) 
1.661< 0.1%
 
1.231< 0.1%
 
150.1%
 
0.991< 0.1%
 
0.912< 0.1%
 
0.881< 0.1%
 
0.861< 0.1%
 
0.822< 0.1%
 
0.812< 0.1%
 
0.82< 0.1%
 

residual_sugar
Real number (ℝ≥0)

Distinct count310
Unique (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.391414863209474
Minimum0.6
Maximum65.8
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.6
5-th percentile1.1
Q11.7
median5.2
Q39.9
95-th percentile15.7
Maximum65.8
Range65.2
Interquartile range (IQR)8.2

Descriptive statistics

Standard deviation5.072057784
Coefficient of variation (CV)0.7935735502
Kurtosis3.469820103
Mean6.391414863
Median Absolute Deviation (MAD)3.6
Skewness1.077093756
Sum31305.15
Variance25.72577016
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.21873.8%
 
1.41843.8%
 
1.61653.4%
 
1.31473.0%
 
1.11463.0%
 
1.51422.9%
 
1.8992.0%
 
1.7992.0%
 
1931.9%
 
2791.6%
 
1.9591.2%
 
2.2561.1%
 
2.1511.0%
 
5430.9%
 
2.3420.9%
 
7.8410.8%
 
2.4410.8%
 
4.6400.8%
 
7.4400.8%
 
2.5400.8%
 
0.9390.8%
 
6.3390.8%
 
2.7380.8%
 
4.8380.8%
 
2.8360.7%
 
Other values (285)291459.5%
 
ValueCountFrequency (%) 
0.62< 0.1%
 
0.770.1%
 
0.8250.5%
 
0.9390.8%
 
0.9540.1%
 
1931.9%
 
1.051< 0.1%
 
1.11463.0%
 
1.1530.1%
 
1.21873.8%
 
ValueCountFrequency (%) 
65.81< 0.1%
 
31.62< 0.1%
 
26.052< 0.1%
 
23.51< 0.1%
 
22.61< 0.1%
 
222< 0.1%
 
20.82< 0.1%
 
20.72< 0.1%
 
20.41< 0.1%
 
20.31< 0.1%
 

chlorides
Real number (ℝ≥0)

Distinct count160
Unique (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04577235606369947
Minimum0.009000000000000001
Maximum0.34600000000000003
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.009
5-th percentile0.027
Q10.036
median0.043
Q30.05
95-th percentile0.067
Maximum0.346
Range0.337
Interquartile range (IQR)0.014

Descriptive statistics

Standard deviation0.02184796809
Coefficient of variation (CV)0.4773179703
Kurtosis37.56459971
Mean0.04577235606
Median Absolute Deviation (MAD)0.007
Skewness5.023330683
Sum224.193
Variance0.0004773337098
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.0442014.1%
 
0.0362004.1%
 
0.0421843.8%
 
0.041823.7%
 
0.0461813.7%
 
0.0481743.6%
 
0.0471713.5%
 
0.051703.5%
 
0.0451703.5%
 
0.0341683.4%
 
0.0381673.4%
 
0.0371603.3%
 
0.0391573.2%
 
0.0411473.0%
 
0.0431412.9%
 
0.0491332.7%
 
0.0351302.7%
 
0.0531302.7%
 
0.0331192.4%
 
0.0511152.3%
 
0.0321092.2%
 
0.031082.2%
 
0.0311072.2%
 
0.0521042.1%
 
0.054992.0%
 
Other values (135)117123.9%
 
ValueCountFrequency (%) 
0.0091< 0.1%
 
0.0121< 0.1%
 
0.0131< 0.1%
 
0.01440.1%
 
0.01540.1%
 
0.01650.1%
 
0.01750.1%
 
0.018100.2%
 
0.01990.2%
 
0.02160.3%
 
ValueCountFrequency (%) 
0.3461< 0.1%
 
0.3011< 0.1%
 
0.291< 0.1%
 
0.2711< 0.1%
 
0.2551< 0.1%
 
0.2441< 0.1%
 
0.241< 0.1%
 
0.2391< 0.1%
 
0.2171< 0.1%
 
0.2121< 0.1%
 

free_sulfur_dioxide
Real number (ℝ≥0)

Distinct count132
Unique (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.30808493262556
Minimum2.0
Maximum289.0
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum2
5-th percentile11
Q123
median34
Q346
95-th percentile63
Maximum289
Range287
Interquartile range (IQR)23

Descriptive statistics

Standard deviation17.00713733
Coefficient of variation (CV)0.4816782716
Kurtosis11.46634243
Mean35.30808493
Median Absolute Deviation (MAD)11
Skewness1.406744921
Sum172939
Variance289.24272
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
291603.3%
 
311322.7%
 
261292.6%
 
351292.6%
 
341282.6%
 
361272.6%
 
241182.4%
 
281122.3%
 
331122.3%
 
371112.3%
 
251112.3%
 
231102.2%
 
321092.2%
 
411042.1%
 
401032.1%
 
221022.1%
 
381022.1%
 
201012.1%
 
451012.1%
 
27992.0%
 
30992.0%
 
21931.9%
 
47911.9%
 
39891.8%
 
17891.8%
 
Other values (107)213743.6%
 
ValueCountFrequency (%) 
21< 0.1%
 
3100.2%
 
4110.2%
 
5250.5%
 
6320.7%
 
7250.5%
 
8350.7%
 
9290.6%
 
10551.1%
 
11450.9%
 
ValueCountFrequency (%) 
2891< 0.1%
 
146.51< 0.1%
 
138.51< 0.1%
 
1311< 0.1%
 
1281< 0.1%
 
1241< 0.1%
 
122.51< 0.1%
 
118.51< 0.1%
 
1121< 0.1%
 
1101< 0.1%
 

total_sulfur_dioxide
Real number (ℝ≥0)

Distinct count251
Unique (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean138.36065741118824
Minimum9.0
Maximum440.0
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum9
5-th percentile75
Q1108
median134
Q3167
95-th percentile212
Maximum440
Range431
Interquartile range (IQR)59

Descriptive statistics

Standard deviation42.49806455
Coefficient of variation (CV)0.3071542543
Kurtosis0.5718532334
Mean138.3606574
Median Absolute Deviation (MAD)29
Skewness0.3907098417
Sum677690.5
Variance1806.085491
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
111691.4%
 
113611.2%
 
117571.2%
 
118551.1%
 
128541.1%
 
122541.1%
 
114541.1%
 
150541.1%
 
124531.1%
 
140521.1%
 
126501.0%
 
133501.0%
 
98491.0%
 
125491.0%
 
149481.0%
 
132471.0%
 
119471.0%
 
131471.0%
 
134471.0%
 
110471.0%
 
156471.0%
 
116471.0%
 
101471.0%
 
130460.9%
 
142460.9%
 
Other values (226)362173.9%
 
ValueCountFrequency (%) 
91< 0.1%
 
101< 0.1%
 
182< 0.1%
 
191< 0.1%
 
211< 0.1%
 
2430.1%
 
251< 0.1%
 
261< 0.1%
 
2840.1%
 
292< 0.1%
 
ValueCountFrequency (%) 
4401< 0.1%
 
366.51< 0.1%
 
3441< 0.1%
 
3131< 0.1%
 
307.51< 0.1%
 
3031< 0.1%
 
2941< 0.1%
 
2821< 0.1%
 
2722< 0.1%
 
2601< 0.1%
 

density
Real number (ℝ≥0)

Distinct count890
Unique (%)18.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9940273764801959
Minimum0.98711
Maximum1.03898
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.98711
5-th percentile0.9896385
Q10.9917225
median0.99374
Q30.9961
95-th percentile0.999
Maximum1.03898
Range0.05187
Interquartile range (IQR)0.0043775

Descriptive statistics

Standard deviation0.002990906917
Coefficient of variation (CV)0.003008877811
Kurtosis9.793806911
Mean0.9940273765
Median Absolute Deviation (MAD)0.00214
Skewness0.9777730049
Sum4868.74609
Variance8.945524186e-06
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.992641.3%
 
0.9928611.2%
 
0.9932531.1%
 
0.993521.1%
 
0.9934501.0%
 
0.9938491.0%
 
0.9927471.0%
 
0.9944460.9%
 
0.9948450.9%
 
0.9954440.9%
 
0.9924440.9%
 
0.9986420.9%
 
0.9956410.8%
 
0.9918400.8%
 
0.9958400.8%
 
0.9914390.8%
 
0.9942380.8%
 
0.9952370.8%
 
0.994370.8%
 
0.9966360.7%
 
0.9937350.7%
 
0.998350.7%
 
0.9917340.7%
 
0.9936340.7%
 
0.9976340.7%
 
Other values (865)382178.0%
 
ValueCountFrequency (%) 
0.987111< 0.1%
 
0.987131< 0.1%
 
0.987221< 0.1%
 
0.98741< 0.1%
 
0.987422< 0.1%
 
0.987462< 0.1%
 
0.987581< 0.1%
 
0.987741< 0.1%
 
0.987791< 0.1%
 
0.987942< 0.1%
 
ValueCountFrequency (%) 
1.038981< 0.1%
 
1.01032< 0.1%
 
1.002952< 0.1%
 
1.002411< 0.1%
 
1.00241< 0.1%
 
1.001961< 0.1%
 
1.001821< 0.1%
 
1.00172< 0.1%
 
1.00121< 0.1%
 
1.001181< 0.1%
 

pH
Real number (ℝ≥0)

Distinct count103
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.1882666394446715
Minimum2.72
Maximum3.82
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum2.72
5-th percentile2.96
Q13.09
median3.18
Q33.28
95-th percentile3.46
Maximum3.82
Range1.1
Interquartile range (IQR)0.19

Descriptive statistics

Standard deviation0.1510005996
Coefficient of variation (CV)0.04736134605
Kurtosis0.5307749515
Mean3.188266639
Median Absolute Deviation (MAD)0.1
Skewness0.4577825459
Sum15616.13
Variance0.02280118108
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.141723.5%
 
3.161643.3%
 
3.221463.0%
 
3.191453.0%
 
3.181382.8%
 
3.21372.8%
 
3.081362.8%
 
3.151362.8%
 
3.11352.8%
 
3.121342.7%
 
3.241322.7%
 
3.111262.6%
 
3.171242.5%
 
3.131172.4%
 
3.231162.4%
 
3.061152.3%
 
3.251142.3%
 
3.04972.0%
 
3.26962.0%
 
3.21951.9%
 
3.3931.9%
 
3.09921.9%
 
3.05891.8%
 
3.27881.8%
 
3.28871.8%
 
Other values (78)187438.3%
 
ValueCountFrequency (%) 
2.721< 0.1%
 
2.741< 0.1%
 
2.771< 0.1%
 
2.7930.1%
 
2.830.1%
 
2.821< 0.1%
 
2.8340.1%
 
2.841< 0.1%
 
2.8590.2%
 
2.8690.2%
 
ValueCountFrequency (%) 
3.821< 0.1%
 
3.811< 0.1%
 
3.82< 0.1%
 
3.791< 0.1%
 
3.772< 0.1%
 
3.762< 0.1%
 
3.752< 0.1%
 
3.742< 0.1%
 
3.7230.1%
 
3.71< 0.1%
 

sulphates
Real number (ℝ≥0)

Distinct count79
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.48984687627603113
Minimum0.22
Maximum1.08
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.22
5-th percentile0.34
Q10.41
median0.47
Q30.55
95-th percentile0.71
Maximum1.08
Range0.86
Interquartile range (IQR)0.14

Descriptive statistics

Standard deviation0.1141258339
Coefficient of variation (CV)0.2329826717
Kurtosis1.59092963
Mean0.4898468763
Median Absolute Deviation (MAD)0.07
Skewness0.9771936833
Sum2399.27
Variance0.01302470597
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.52495.1%
 
0.462254.6%
 
0.442164.4%
 
0.382144.4%
 
0.421813.7%
 
0.481793.7%
 
0.451783.6%
 
0.471723.5%
 
0.41683.4%
 
0.541673.4%
 
0.491663.4%
 
0.431613.3%
 
0.521563.2%
 
0.391513.1%
 
0.511402.9%
 
0.411392.8%
 
0.531352.8%
 
0.371292.6%
 
0.361202.4%
 
0.561082.2%
 
0.551022.1%
 
0.58992.0%
 
0.59972.0%
 
0.6881.8%
 
0.35851.7%
 
Other values (54)107321.9%
 
ValueCountFrequency (%) 
0.221< 0.1%
 
0.231< 0.1%
 
0.2540.1%
 
0.2640.1%
 
0.27130.3%
 
0.28130.3%
 
0.29160.3%
 
0.3310.6%
 
0.31350.7%
 
0.32541.1%
 
ValueCountFrequency (%) 
1.081< 0.1%
 
1.061< 0.1%
 
1.011< 0.1%
 
11< 0.1%
 
0.991< 0.1%
 
0.9860.1%
 
0.971< 0.1%
 
0.9630.1%
 
0.9550.1%
 
0.942< 0.1%
 

alcohol
Real number (ℝ≥0)

Distinct count103
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.514267047774602
Minimum8.0
Maximum14.2
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum8
5-th percentile8.9
Q19.5
median10.4
Q311.4
95-th percentile12.7
Maximum14.2
Range6.2
Interquartile range (IQR)1.9

Descriptive statistics

Standard deviation1.230620568
Coefficient of variation (CV)0.1170429248
Kurtosis-0.6984253278
Mean10.51426705
Median Absolute Deviation (MAD)1
Skewness0.4873419932
Sum51498.88
Variance1.514426982
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
9.42294.7%
 
9.52284.7%
 
9.21994.1%
 
91853.8%
 
101623.3%
 
10.51603.3%
 
111583.2%
 
10.41533.1%
 
9.11442.9%
 
9.81362.8%
 
10.81352.8%
 
9.31342.7%
 
10.21302.7%
 
9.61282.6%
 
11.41212.5%
 
10.11142.3%
 
10.61142.3%
 
11.21122.3%
 
9.91092.2%
 
8.81072.2%
 
9.71052.1%
 
121022.1%
 
11.31012.1%
 
10.7962.0%
 
8.9951.9%
 
Other values (78)144129.4%
 
ValueCountFrequency (%) 
82< 0.1%
 
8.430.1%
 
8.590.2%
 
8.6230.5%
 
8.7781.6%
 
8.81072.2%
 
8.9951.9%
 
91853.8%
 
9.11442.9%
 
9.21994.1%
 
ValueCountFrequency (%) 
14.21< 0.1%
 
14.051< 0.1%
 
1450.1%
 
13.930.1%
 
13.82< 0.1%
 
13.770.1%
 
13.690.2%
 
13.551< 0.1%
 
13.5120.2%
 
13.4200.4%
 

quality
Real number (ℝ≥0)

Distinct count7
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.87790935075541
Minimum3
Maximum9
Zeros0
Zeros (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum3
5-th percentile5
Q15
median6
Q36
95-th percentile7
Maximum9
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.885638575
Coefficient of variation (CV)0.1506723772
Kurtosis0.2165258272
Mean5.877909351
Median Absolute Deviation (MAD)1
Skewness0.1557963977
Sum28790
Variance0.7843556855
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6219844.9%
 
5145729.7%
 
788018.0%
 
81753.6%
 
41633.3%
 
3200.4%
 
950.1%
 
ValueCountFrequency (%) 
3200.4%
 
41633.3%
 
5145729.7%
 
6219844.9%
 
788018.0%
 
81753.6%
 
950.1%
 
ValueCountFrequency (%) 
950.1%
 
81753.6%
 
788018.0%
 
6219844.9%
 
5145729.7%
 
41633.3%
 
3200.4%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcoholquality
07.00.270.3620.70.04545.0170.01.00103.000.458.86
16.30.300.341.60.04914.0132.00.99403.300.499.56
28.10.280.406.90.05030.097.00.99513.260.4410.16
37.20.230.328.50.05847.0186.00.99563.190.409.96
47.20.230.328.50.05847.0186.00.99563.190.409.96
58.10.280.406.90.05030.097.00.99513.260.4410.16
66.20.320.167.00.04530.0136.00.99493.180.479.66
77.00.270.3620.70.04545.0170.01.00103.000.458.86
86.30.300.341.60.04914.0132.00.99403.300.499.56
98.10.220.431.50.04428.0129.00.99383.220.4511.06

Last rows

fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcoholquality
48886.80.2200.361.200.05238.0127.00.993303.040.549.25
48894.90.2350.2711.750.03034.0118.00.995403.070.509.46
48906.10.3400.292.200.03625.0100.00.989383.060.4411.86
48915.70.2100.320.900.03838.0121.00.990743.240.4610.66
48926.50.2300.381.300.03229.0112.00.992983.290.549.75
48936.20.2100.291.600.03924.092.00.991143.270.5011.26
48946.60.3200.368.000.04757.0168.00.994903.150.469.65
48956.50.2400.191.200.04130.0111.00.992542.990.469.46
48965.50.2900.301.100.02220.0110.00.988693.340.3812.87
48976.00.2100.380.800.02022.098.00.989413.260.3211.86

Duplicate rows

Most frequent

fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcoholqualitycount
4237.00.150.2814.70.05129.0149.00.997922.960.399.078
5577.30.190.2713.90.05745.0155.00.998072.940.418.888
3356.80.180.3012.80.06219.0171.00.998083.000.529.077
5897.40.160.3013.70.05633.0168.00.998252.900.448.777
5887.40.160.2715.50.05025.0135.00.998402.900.438.776
5927.40.190.3012.80.05348.5229.00.998603.140.499.176
5937.40.190.3114.50.04539.0193.00.998603.100.509.266
6417.60.200.3014.20.05653.0212.50.999003.140.468.986
285.70.220.2016.00.04441.0113.00.998623.220.468.965
1106.20.230.3617.20.03937.0130.00.999463.230.438.865